Search CORE

Elsevier - Publisher Connector

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

Author: C Hohlweg
CSJA Nash-Williams
D Kosolobov
GS Brodal
H Barcelo
J Fischer
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Giraud
SJ Puglisi
W Rytter
W Rytter
Publication venue
Publication date: 01/01/2016
Field of study

Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an

O(n (\log n)^{2/3})

-time algorithm for answering

O(n)

LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to

O(n \log \log n)

time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any

n

such non-crossing queries can be answered on-line in

O(n \alpha(n))

time, which yields an

O(n \alpha(n))

-time algorithm for computing runs

King's Research Portal

Hal-Diderot

HAL - UPEC / UPEM

A really simple approximation of smallest grammar

Author: A. Jeż
F. Rubin
H. Sakamoto
J. Kärkkäinen
M. Charikar
M. Lohrey
W. Rytter
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified with numbers from 1,ldots, N^c for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional O(l) letters, where l was the size of the original LZ77 factorisation. In one phase in a greedy way (by a left-to-right sweep and a help of the factorisation) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are O(l) many different pairs among them. Hence there are O(log N) phases, each of them introduces O(l) nonterminals to a grammar. A more precise analysis yields a bound O(l log(N/l)). As l \leq g, this yields the desired bound O(g log(N/g)).Comment: Accepted for CPM 201

MPG.PuRe

Online Pattern Matching for String Edit Distance with Moves

Author: D. Shapira
G. Navarro
J. Kececioglu
R. Clifford
S. Maruyama
V. Bafna
V.I. Levenshtein
W. Rytter
Publication venue
Publication date: 01/01/2014
Field of study

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

Syntactic View of Sigma-Tau Generation of Permutations

Author: F Ruskey
J Sawada
Joe Sawada
PF Dietz
W Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/03/2019
Field of study

We give a syntactic view of the Sawada-Williams

(\sigma,\tau)

-generation of permutations. The corresponding sequence of

\sigma-\tau

-operations, of length

n!-1

is shown to be highly compressible: it has

O(n^2\log n)

bit description. Using this compact description we design fast algorithms for ranking and unranking permutations.Comment: accepted on LATA201

On the maximal number of cubic subwords in a string

Author: A. Apostolico
A. Thue
A.S. Freankel
C.S. Iliopoulos
D. Damanik
L. Ilie
L. Ilie
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Giraud
M. Lothaire
M.G. Main
M.G. Main
N.J. Fine
P. Baturo
R.M. Kolpakov
S.J. Puglisi
W. Rytter
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We investigate the problem of the maximum number of cubic subwords (of the form

www

) in a given word. We also consider square subwords (of the form

ww

). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares

xx

such that

x

is not a primitive word (nonprimitive squares) in a word of length

n

is exactly

\lfloor \frac{n}{2}\rfloor - 1

, and the maximum number of subwords of the form

x^k

, for

k\ge 3

, is exactly

n-2

. In particular, the maximum number of cubes in a word is not greater than

n-2

either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length

n

is between

(1/2)n

and

(4/5)n

. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

Online Self-Indexed Grammar Compression

Author: F Claude
F Claude
G Cormode
G Navarro
M Karpinski
S Maruyama
S Maruyama
T Gagie
T Gagie
W Rytter
Y Takabatake
Publication venue
Publication date: 06/07/2015
Field of study

Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the index structure by reading input characters one-by-one. Such a property is another advantage which enables saving a working space for construction, because we do not need to store input texts in memory. We experimentally test OESP-index on the ability to build index structures and search query texts, and we show OESP-index's efficiency, especially space-efficiency for building index structures.Comment: To appear in the Proceedings of the 22nd edition of the International Symposium on String Processing and Information Retrieval (SPIRE2015

Fast Searching in Packed Strings

Author: A. Amir
D.E. Knuth
E.W. Myers
G. Navarro
J. Tarhio
K. Fredriksson
K. Fredriksson
R. Baeza-Yates
R.A. Baeza-Yates
R.M. Karp
R.S. Boyer
S. Wu
S.T. Klein
T.A. Welch
V.L. Arlazarov
W. Masek
W. Rytter
Publication venue
Publication date: 01/01/2009
Field of study

Given strings

P

and

Q

the (exact) string matching problem is to find all positions of substrings in

Q

matching

P

. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let

m \leq n

be the lengths

P

and

Q

, respectively, and let

\sigma

denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of

P

Q

. For

m = o(n)

this improves the

O(n)

bound of the Knuth-Morris-Pratt algorithm. Furthermore, if

m = O(n/\log_\sigma n)

our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

Elsevier - Publisher Connector

Online Research Database In Technology

Capturing metal-support interactions in situ during the reduction of a Re promoted Co/γ-Al<sub>2</sub>O<sub>3</sub> catalyst

Author: . Holmen A.
Johnsen Rune E.
Rytter E.
Rønning M.
Tsakoumis N. E.
van Beek W.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2016
Field of study

Online Research Database In Technology

Faster subsequence recognition in compressed strings

Author: A Tiskin
A Tiskin
A. Tiskin
BW Watson
CER Alves
G Myers
G Navarro
G Ziv
G Ziv
J Kärkkäinen
JL Bentley
M Crochemore
P Cégielski
TA Welch
W Rytter
WJ Masek
Publication venue
Publication date: 18/01/2008
Field of study

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length

\bar m

, and an uncompressed pattern of length

n

, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time

O(\bar mn^2 \log n)

. We improve the running time to

O(\bar mn^{1.5})

. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time

O(\bar mn^{1.5})

; the same problem with a compressed pattern is known to be NP-hard